14 research outputs found

    ShotgunWSD: An unsupervised algorithm for global word sense disambiguation inspired by DNA sequencing

    Full text link
    In this paper, we present a novel unsupervised algorithm for word sense disambiguation (WSD) at the document level. Our algorithm is inspired by a widely-used approach in the field of genetics for whole genome sequencing, known as the Shotgun sequencing technique. The proposed WSD algorithm is based on three main steps. First, a brute-force WSD algorithm is applied to short context windows (up to 10 words) selected from the document in order to generate a short list of likely sense configurations for each window. In the second step, these local sense configurations are assembled into longer composite configurations based on suffix and prefix matching. The resulted configurations are ranked by their length, and the sense of each word is chosen based on a voting scheme that considers only the top k configurations in which the word appears. We compare our algorithm with other state-of-the-art unsupervised WSD algorithms and demonstrate better performance, sometimes by a very large margin. We also show that our algorithm can yield better performance than the Most Common Sense (MCS) baseline on one data set. Moreover, our algorithm has a very small number of parameters, is robust to parameter tuning, and, unlike other bio-inspired methods, it gives a deterministic solution (it does not involve random choices).Comment: In Proceedings of EACL 201

    EDEM3 Domains Cooperate to Perform Its Overall Cell Functioning

    No full text
    EDEM3 recognizes and directs misfolded proteins to the ER-associated protein degradation (ERAD) process. EDEM3 was predicted to act as lectin or as a mannosidase because of its homology with the GH47 catalytic domain of the Man1B1, but the contribution of the other regions remained unresolved. Here, we dissect the molecular determinants governing EDEM3 function and its cellular interactions. LC/MS analysis indicates very few stable ER interactors, suggesting EDEM3 availability for transient substrate interactions. Sequence analysis reveals that EDEM3 consists of four consecutive modules defined as GH47, intermediate (IMD), protease-associated (PA), and intrinsically disordered (IDD) domain. Using an EDEM3 knock-out cell line, we expressed EDEM3 and domain deletion mutants to address EDEM3 function. We find that the mannosidase domain provides substrate binding even in the absence of mannose trimming and requires the IMD domain for folding. The PA and IDD domains deletions do not impair the trimming, but specifically modulate the turnover of two misfolded proteins, NHK and the soluble tyrosinase mutant. Hence, we demonstrate that EDEM3 provides a unique ERAD timing to misfolded glycoproteins, not only by its mannose trimming activity, but also by the positive and negative feedback modulated by the protease-associated and intrinsically disordered domain, respectively

    Isolation of Circulating Tumor Cells from Seminal Fluid of Patients with Prostate Cancer Using Inertial Microfluidics

    No full text
    Prostate cancer (PCa) diagnosis is primarily based on prostate-specific antigen (PSA) testing and prostate tissue biopsies. However, PSA testing has relatively low specificity, while tissue biopsies are highly invasive and have relatively low sensitivity at early stages of PCa. As an alternative, we developed a technique of liquid biopsy, based on isolation of circulating tumor cells (CTCs) from seminal fluid (SF). The recovery of PCa cells from SF was demonstrated using PCa cell lines, achieving an efficiency and throughput as high as 89% (±3.8%) and 1.7 mL min−1, respectively, while 99% (±0.7%) of sperm cells were disposed of. The introduced approach was further tested in a clinical setting by collecting and processing SF samples of PCa patients. The yield of isolated CTCs measured as high as 613 cells per SF sample in comparison with that of 6 cells from SF of healthy donors, holding significant promise for PCa diagnosis. The correlation analysis of the isolated CTC numbers with the standard prognostic parameters such as Gleason score and PSA serum level showed correlation coefficient values at 0.40 and 0.73, respectively. Taken together, our results show promise in the developed liquid biopsy technique to augment the existing diagnosis and prognosis of PCa
    corecore